88 research outputs found

    Towards efficient data integration and knowledge management in the Agronomic domain

    Get PDF
    International audienceToday, the revolution in empirical technologies has generated vast amounts of data. This data deluge has created an urgent need to assimilate it with a panoramic view. To this end, information systems play a central role in managing and integrating these data, aiding the biologists in exploiting this integrated information for the extraction of new knowledge. The plant bioinformatics node of the Institut Français de Bioinformatique (IFB) maintains public information systems where a variety of domain specific data are integrated. Currently, efforts are being taken to expose the IFB plant bioinformatics resources as RDF, utilising domain specific ontologies and metadata. Here, we present the overview and the progress of the project

    GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts

    Get PDF
    Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Pro

    GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts

    Get PDF
    Genomic projects heavily depend on genome annotations and are limited by the current deficiencies in the published predictions of gene structure and function. It follows that, improved annotation will allow better data mining of genomes, and more secure planning and design of experiments. The purpose of the GeneFarm project is to obtain homogeneous, reliable, documented and traceable annotations for Arabidopsis nuclear genes and gene products, and to enter them into an added-value database. This re-annotation project is being performed exhaustively on every member of each gene family. Performing a family-wide annotation makes the task easier and more efficient than a gene-by-gene approach since many features obtained for one gene can be extrapolated to some or all the other genes of a family. A complete annotation procedure based on the most efficient prediction tools available is being used by 16 partner laboratories, each contributing annotated families from its field of expertise. A database, named GeneFarm, and an associated user-friendly interface to query the annotations have been developed. More than 3000 genes distributed over 300 families have been annotated and are available at http://genoplante-info.infobiogen.fr/Genefarm/. Furthermore, collaboration with the Swiss Institute of Bioinformatics is underway to integrate the GeneFarm data into the protein knowledgebase Swiss-Prot

    Measures for interoperability of phenotypic data: minimum information requirements and formatting

    Get PDF
    BackgroundPlant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse.ResultsIn this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called “Minimum Information About a Plant Phenotyping Experiment”, which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented.ConclusionsAcceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data

    BrAPI-an application programming interface for plant breeding applications

    Get PDF
    Motivation: Modern genomic breeding methods rely heavily on very large amounts of phenotyping and genotyping data, presenting new challenges in effective data management and integration. Recently, the size and complexity of datasets have increased significantly, with the result that data are often stored on multiple systems. As analyses of interest increasingly require aggregation of datasets from diverse sources, data exchange between disparate systems becomes a challenge. Results: To facilitate interoperability among breeding applications, we present the public plant Breeding Application Programming Interface (BrAPI). BrAPI is a standardized web service API specification. The development of BrAPI is a collaborative, community-based initiative involving a growing global community of over a hundred participants representing several dozen institutions and companies. Development of such a standard is recognized as critical to a number of important large breeding system initiatives as a foundational technology. The focus of the first version of the API is on providing services for connecting systems and retrieving basic breeding data including germplasm, study, observation, and marker data. A number of BrAPI-enabled applications, termed BrAPPs, have been written, that take advantage of the emerging support of BrAPI by many databases

    Crop Ontology Governance and Stewardship Framework

    Get PDF
    A governance & stewardship framework for the Crop Ontology Project is required as this is a collaborative tool developed by a Community of Practice. Over the last 12 years of its existence, it has increased significantly in scope and use. Collecting and storing plant trait data and annotating the data with ontology terms is widely accepted by the crop science community to be critical to enable data interoperability and interexchange through tools such as the Breeding API (BrAPI). The Crop Ontology Community of Practice is organised around roles, curation principles and validation processes that require a formal description. A governance framework is defined by the various actors involved in the asset’s design, development and maintenance. It is complemented by a quality assurance process to ensure that trust levels, value creation, and sustainability objectives meet appropriate quality levels. The general principles underlying data governance are integrity, transparency, accountability and ownership, stewardship, standardization, change management and a robust data audit

    The ontologies community of practice: a CGIAR initiative for Big Data in agrifood systems

    Get PDF
    Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams
    • 

    corecore